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The problem of distributing the workload on a parallel computer to minimize the overall runtime 
is known as Multiprocessor Scheduling Problem. It is NP-hard, but like many other NP- 
hard problems, the average hardness of random instances displays an "easy-hard" phase transition. 
The transition in Multiprocessor Scheduling can be analyzed using elementary notions from 
crystallography (Bravais lattices) and statistical mechanics (Potts vectors). The analysis reveals 
the control parameter of the transition and its critical value including finite size corrections. The 
transition is identified in the performance of practical scheduling algorithms. 



One of the major problems in parallel computing is 
load balancing, the even distribution of the workload on 
the processors of a parallel computer. The goal is to find 
a schedule, i.e. an assignment of N tasks to q processors so 
as to minimize the largest task finishing time (makespan) . 
This is a very hard problem, since individual tasks may 
depend on each other and the time a task runs on a given 
processor may not be known in advance. 

The most simple variant of a scheduling problem is 
known as Multiprocessor Scheduling Problem, 
Msp jl]]. Here the N tasks are independent, all q 
processors are equal and the running time at E N of 
each task is known in advance. A schedule is a map 
,s : {1, . . . , N} i ► {1, .... q} with Sj = a denoting that 
task i is assigned to processor a. The problem then is to 
minimize the makespan 



N 



T(si,S2, ■ ■ ■ , sjv) 



max 



ai5(i 



(i) 



A a is the total workload of processor a and 6 is the 
usual Kronecker symbol. Msp belongs to the class of NP- 
hard optimization problems (2| which basically means 
that no one has ever found a scheduling algorithm that 
is significantly faster than exhaustive search through all 
0(q N ) schedules, and probably no one ever will. With 
computational costs that increase exponentially with the 
problem size N, Msp must be considered intractable. 

NP-hardness refers to worst case scenarios. On the 
other hand, randomly generated instances of NP-hard 
problems often are surprisingly easy to solve, and Msp 
is no exception. Numerical experiments |Q| with ai be- 
ing random B-bit integers reveal two distinct regimes: 
For small values of n = B/N typical instances can be 
solved without exponential search, for large values of k, 
exponential search is mandatory. The transition from 
the "easy" to the "hard" transition gets sharper as N in- 
creases, and in the limit N — > oo it happens at a threshold 



* E-mail 
t E-mail 

* E-mail 



heikn.l)aiikfi'n;st.iidf'|-it.iini-iiia.PYifihiirp-.df 



: stephan. mertensiiiplivsik.il ni-magd ehurg.dt 



andreas.engel@physik.um-magdeburg.d( 



value K c (q). Similar phase transitions have been observed 
in many other combinatorial optimization problems [9, 
A transition in the typical algorithmic complexity nor- 
mally corresponds to a structural change in the typical 
instances of the random ensemble. The latter in turn can 
be analyzed using the methods and notions from statisti- 
cal mechanics ||. An outstanding example for the fruit- 
fulness of this interdisciplinary approach is the analysis 
of random if- Satisfiability with the replica J|] and the 
cavity method ||. 

The structural change in Msp that lies underneath the 
algorithmic transition is the appearance of perfect sched- 
ules. Let r = J^j a j m °d q. A schedule with 



1 N 



1 if 1 < a < r 
if r < a < q 



(2) 



(and its (*) equivalent rearrangements) is called perfect 
since it obviously minimizes the makespan T (Eq. [j]). 
Whenever an algorithm runs into a perfect schedule it 
can stop the search possibly before having explored an 
exponential part of the search space. Numerical simula- 
tions in fact indicate that the probability that a random 
instance has a perfect schedule decreases from 1 for k = 
to as k > 1, and for large N the probability jumps 
abruptly from 1 to as k crosses a critical value n c (q). 
In this letter we will calculate n c {q). 

The Msp for q = 2 is known as Number Partioning 
Problem, Npp ||. The transition point of the Npp, 
k c (2), has been calculated within the canonical formalism 
of statistical mechanics Jl(J and the results have been 
confirmed recently by rigorous proofs pH . In principle 
one could extend the method of to general q, but 
here we will follow a microcanonical approach. 

A first estimate for the critical k can be obtained fol- 
lowing a nice heuristic argument Q: Given the values 
of the di are k,N -h\t integers, the workloads defined by 
Eq. U are (neglegting for the moment carry bits) also 
KiV-bit integers. The probability that a randomly cho- 
sen schedule {s\, S2, ■ ■ ■ , sat) realizes a particular value of 
Ai is therefore 2~ kN . Neglecting correlations the chance 
to realize all the workloads defined in Eq. ^ is hence 
2~( <? ~ 1 ) KAr (A q is fixed implicitly). Since there are q N 



2 



different schedules the number of perfect ones is roughly 
given by q N 2~^ 9 ~ 1 > KN . The occurrence of a phase tran- 
sition in Msp is now easily understood. If k is small we 
expect an exponential number of perfect solutions, but 
for k larger than a critical value 



log 2 g 
q-1 



(3) 



the number of perfect solutions is exponentially small. 

The first step of our approach is a convenient encoding 
of the schedules and the cost function. The workloads 
on the processors are not independent: what is removed 
from one processor must be done by another. We can 
incorporate this constraint automatically by encoding the 
schedule as Potts vectors Jl^|. Potts vectors are 
(q— l)-dimensional unit vectors pointing at the q corners 
of a (q — l)-dimensional hypertetrahedron (see Fig. |l| for 
the case q = 3) . This implies that the angle between two 
Potts vectors is the same for all pairs of different vectors, 



q5(a -/?)-! 



(4) 



A schedule is encoded by N Potts vectors Sj , where Sj 
e ( Q ) means that task j is assigned to processor a, i.e. 



(0) 



The target vector 



N 



E({s}) = E a J*3 

3=1 

encodes the workload of all processors, 

n — ' 



a,- = E ■ e (a) , 

1 '~f 9 



(5) 



(6) 



(7) 



and minimizing T (Eq. [j]) is equivalent to minimizing E 
with respect to the supremum norm [ |l3"| . 

For integer values a, the minimal change of a schedule 
is to remove 1 from one processor and add it to one of the 




FIG. 1: Lattice of target vectors for q — 3 with the three Potts 
vectors (gray) and the two primitive vectors b a (black). The 
white (gray, black) lattice points correspond to 53 mod 3 = 

0(1,2). 



other q — 1 processors. Hence possible values of E({s}) 
are points on a (q — l)-dimensional Bravais-lattice with 
primitive vectors 



b a = e [a) - e {q) a = 1, . . . , q - 1 



(8) 



These primitive vectors span a sublattice of the lattice 
generated by q— 1 Potts vectors. The sublattice contains 
every qth point of the Potts lattice and correspondingly 
there are q classes of such lattice points depending on 
Jjcijmodg (Fig. [l]). The volume V(q) of the primitive 
cell in our sublattice can be calculated from the Gram 
determinant, 



V 2 (q) = det(b a ■ bp) 



(q- l)?-i ■ 

The average number fl of schedules with target E is 



(9) 



N 



(10) 



where (•) denotes averaging over the i.i.d. random num- 
bers a*. For fixed schedule {sj} and large TV, the sum 



Sj=i a jSj i s Gaussian with mean 



E 



<«>E 



Sj =:(a)M 



(11) 



and variance of the components E — (Ex, . . . , E q —\) 
(EtEk) - (Ei) (E k ) = o-l YJfiWi)* - ^ (I 2 ) 



with a\ — (a 2 ) — (a) 2 . "Magnetization" M and variance 
matrix g = (<?i,fc) depend on the schedule only through 
the numbers N a = J2j $( s j ~~ a ) °f tasks assigned to 
processor a: 



M = N a ( 



•(«) 



9i,k 



P (") P («) 



(13) 



The trace over {s} is basically an average over all trajec- 
tories of a random walk in q — 1 dimensions. For large 
N this average is dominated by trajectories with M = 0, 
i.e. iV Q = N/q. For these trajectories, the matrix g is 
diagonal, 



9i,k 



5(i - k) 

q-i 



(14) 



and we have basically an independent random walk in 
each of the q — 1 directions of our lattice. The probability 
to occupy after N steps a position E a away from the 
origin in direction a is therefore given by 



(w \ Vg^i ( (<z-i)£< 2 

p(E a ) = - = : = exp 



^/2irN (a 2 



2N (a 2 ) 



(15) 
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The probability of finding the q—1 walkers at E reads 



p(E) 



(g-1) 
2ttN (a 2 ) 



(9-l)/2 



exp 



(g- 1)1^1 
2N (a 2 ) 



(16) 



To get the number of schedules with given target vector 
E we have to multiply the density p(E) by q N and the 
volume V(q) of the primitive cell of our Bravais lattice, 



n(E) = 



q N qq /2 



(2ttN (a 2 )) 



(?-l)/2 



exp 



<1 



1 



2N (a 2 



\E\ 



(17) 

For target vectors \E\ = 0(1) the distribution looks es- 
sentially flat as N — > oo, i.e. there are as many perfect 
schedules as there are suboptimal schedules. 

The density of scalar quantities like \E\ 2 gets a factor 
|S| 9-2 f rom the volume element in (q — l)-dimensional 
spherical coordinates. For q > 2 this leads to a maximum 
of the microcanonical entropy at some value \E\ > 0, and 
this maximum gets sharper with increasing q, a scenario 
that has been observed in Monte Carlo simulations |fl4|| . 
However, this implies no fundamental difference between 
q = 2 and q > 2 as claimed in but is of purely 
geometrical origin. 

For the location of the phase transition we can concen- 
trate on perfect schedules, i.e. we set \E\ = 0(1) and we 
assume that the <2j's are uniformely distributed kN -bit 
integers. From 



we get 



with 



(a 2 ) = \2 2kN {l-0(2- KN )) 



log 2 fi(0) - N(q - 1) • (k c - K ) 



log 2 g \_ 

g-1 2N 



log 2 



2ttN 

3(?9/(9- 1 ) 



(18) 



(19) 



(20) 



The first term corresponds to the result of the heuris- 
tic argument from above (Eq. ||). The second term rep- 
resents finite size corrections. For q = 2 Eqs. [l9] and 
P0| reduce to the known results for number partitioning 

m 0- 

According to Eq. [19| the microcanonical entropy 
logQ(-E) is a linear function of k for large N. In fact this 
linearity already holds for rather small values of N, as 
can be seen from numerical enumerations (Fig. |^). Lin- 
ear regression on the data for log fl(E) gives numerical 
values for k c (N). These values in turn agree well with 
the predictions of Eq. |o| for larger values of N (Fig. ||) . 

Up to this point we have discussed static properties 
of random Msp. How do they affect the dynamical be- 
havior of search algorithms? An obvious algorithm is to 
sort the tasks a, in decreasing order and to assign the 
first (and largest) task to processor 1. The next tasks 
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FIG. 2: Numerical measurements of fl for q — 3. Symbols are 
averages over 10 3 random instances with J2j a i mod? = 0, lines 
are given by Eq. The errorbars of the enumeration data are 
smaller than the symbols. 
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FIG. 3: k, c from linear regression of the numerical data for 
(symbols) compared to Eq. ^ (curves) 



are each assigned to the processor with the smallest to- 
tal workload so far. Proceed until all tasks are assigned. 
Ties are broken by selecting the processor with the lower 
rank. This so called greedy heuristics usually produces 
poor schedules, but it can be extended to an algorithm 
that yields the optimum schedule. Instead of assigning a 
task to one processor, the extended algorithm branches: 
in the first branch it follows the heuristic rule and as- 
signs the task to the processor with the lowest workload, 
in the second branch it selects the processor with the 
second lowest workload and so on. Ignoring ties this al- 
gorithm generates the complete search tree with its q N 
leaves and hence will eventually find the true optimum. 
This algorithm is known as Complete Greedy Algorithm 
(CGA) |L5fl. Of course it is worst-case exponential, but 



4 



with pruning we can hope to achieve a speedup for the 
typical case. The most efficient pruning rule is to simply 
stop the moment one hits upon a perfect schedule. A less 
efficient pruning rule applies if the sum of the unassigned 
tasks is smaller than the difference between the current 
maximal and minimal workloads. In this case one can 
simply assign all remaining tasks to the processor with 
the minimum workload. 
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FIG. 4: Median of number of nodes traversed by the complete 
greedy algorithm for q = 3 and fixed number of bits B. The 
circles mark the critical system size N c given by Eq. ^l] 

In our simulations we fix the number B of bits in the 
dj's and measure the number of nodes traversed by CGA 
as a function of N. In this case k c translates into a critical 
value N c for the system size, N c being the solution of 

B log 2 Q 1 ( 2nN c \ 

wr — -m c lo ^w^)- (21) 



Fig. H shows a typical result for q — 3. For N < N c 
the number of nodes traversed by CGA increases like 
3 xN with x 0.84. The fact that x < 1 is due to the 
pruning. As soon as N > N c , the pruning by perfect 
solutions takes effect and the growth slows down signif- 
icantly. Eventually the search costs even decrease with 
increasing TV. This indicates that the algorithm takes 
advantage of the growing number of perfect solutions, 
allthough their relative number still decays exponentially. 
There are algorithms that outperform simple CGA, but 
the differences show only for N > N c : With better al- 
gorithms the subexponential growth and the decrease of 
the search costs set in at values of N closer to but always 
above iV c 

To conclude we have shown that MULTIPROCESSOR 
Scheduling has a phase transition controlled by the 
numerical resolution (number of bits) of the individual 
task sizes. The "easy" phase is characterized by an expo- 
nential number of perfect schedules, the "hard" phase by 
the absence of perfect schedules. Note that for q = 2 it 
has been demonstrated that deep in the "hard" phase the 
system behaves essentially like a random energy model 
pq| . This fact allowed the calculation of the complete 
statistics of the optimal solutions and it explains the bad 
performance of heuristic algorithms. It would be inter- 
esting to extend this approach to q > 2. 
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